NaN-friendly convolution #155

astrofrog · 2012-02-08T23:00:17Z

This pull request implements convolution functions for multi-dimensional arrays, with proper support for NaN values (which scipy's convolve does not have, but which is needed for Astronomical images).

The current code has the following limitations:

Only 2D convolution is supported at this time
No documentation has been added
The tests are not yet exhaustive

I want to address all these points before we consider merging in the code. However, I wanted to open the pull request to get feedback on the existing code, before extending the code to 1D and 3D+, and writing more tests, since these will involve significantly more code.

The only function intended for users is:

from astropy.nddata.convolve import convolve

This calls the different Cython functions depending on dimensionality and boundary treatment. The reason for implementing the four different Cython functions is that it is significantly faster than constantly checking the boundary option inside a single Cython function.

The convolve function's docstring should explain all the options.

I decided to include this in astropy.nddata to mirror scipy.ndimage (rather than putting it in astropy.tools. I think that is the right thing to do. I think that convolve should be callable directly as above, but of course we can also add a convolve method for NDData objects that then relies on nddata.convolve.

The treatment of the edges and NaN values is inspired by IDL's CONVOL function (e.g. http://star.pst.qub.ac.uk/idl/CONVOL.html).

Performance-wise, the speed is very similar to scipy's convolve when boundary=None, and slightly worse for other boundary options, but that is the price to pay for dealing with the NaN values correctly.

Let me know what you think!

eteq · 2012-02-09T10:07:02Z

I definitely, like the overall scheme, but one organizational thought. I'm concerned about how astropy/nddata/convolve/__init__.py works. As it stands now, if I do from astropy.nddata.convolve import convolve, I get a function, even though there's a module astropy/nddata/convolve/convolve.py. Right now this isn't necessarily a big deal because there's only one function in this module, but it still strikes me as quite confusing, and might get much worse if more functions are added later. In addition, requireing two nested layers of imports starts getting confusing when the user-level only involves a single function (and in the future, probably only a few more).

Instead, how about leaving this __init__.py empty, and moving what's in it to astropy/nddata/__init__.py, and changing the name of the package from "convolve" to "convolution" or something like that (e.g., in astropy/nddata/__init__.py, the import would be from .convolution.convolve import convolve. Then nothing is masked, and the convolution still lives in the ndddata module.

astrofrog · 2012-02-09T10:47:54Z

So if I understand correctly what you are saying, the user would then import it with

from astropy.nddata import convolve

This does make sense, since scipy has the same:

from scipy.ndimage import convolve

On a side note, it would be possible to add functions such as gaussian_filter() that would again mirror the scipy capabilities, but with support NaN values, and these could then be imported with:

from astropy.nddata import gaussian_filter

Anyway, I'll implement your suggestion, but I will wait for other comments before doing so.

embray · 2012-02-09T16:51:18Z

astropy/nddata/convolve/convolve.py

+    boundary: str, optional
+        A flag indicating how to handle boundaries:
+            * None : set the ``result`` values to zero where the kernel
+                     extends eyond the edge of the array (default)


Typo: eyond

eteq · 2012-02-11T01:36:34Z

@astrofrog - yep, that's what I had in mind, as long as the function convolve doesn't overshadow the package convolve (and that would be fixed by changing the package name to convolution). I think in general we want the most user-friendly stuff in the first-level subpackage, but always leave the deeper subpackages/modules accessible.

astrofrog · 2012-02-11T21:44:31Z

I've implemented all the comments! Is there anything else you can think of before I extend this to more dimensions? How many dimensions should this be extended to? I'm not sure if many people would need 4D convolution and above, but 1D, 2D, and 3D certainly seem useful.

astrofrog · 2012-02-12T22:04:56Z

I having a little issue with optimization. On line 71 of boundary_none.pyx, I want val * ker to be equal to zero if val is NaN and ker is zero (instead of NaN, which is the normal result). However, I can't figure out a way to do this without doubling the runtime. For example, doing:

if ker != 0.:
    top += val * ker
    bot += ker

literally doubles the runtime speed. Does anyone have any idea how to do this differently? Is there an efficient way to override the multiplication (or use a custom mult) function that will have the behavior NaN * 0 = 0?

astrofrog · 2012-02-12T22:24:11Z

If I try a different if statement such as:

if not isnan(val):
    top += val * ker
    bot += ker

then I don't get much of a slowdown. isnan was defined as a macro with:

cdef extern from "math.h":
    bint isnan(double x)

The example I'm using has no zeros nor nans, so in both cases the if statement always returns zero. But ker != 0. seems to take much longer than isnan(val). Any ideas on how to speed up ker != 0.?

astrofrog · 2012-02-14T12:53:42Z

I've found a way around my previous question about optimization which seems to work fine. I've now added some basic documentation. The next step will be to add more tests, extend to 1 and 3-dimensions, and add examples to the documentation.

eteq · 2012-02-15T08:00:49Z

I agree that 1D convolution should be the next priority, followed by 3D. I can't think of any particular reason why you'd want to implement 4D (for that matter, offhand I can't think of any common cases for 3D, but I imagine those do exist).

I'm also getting 3 test failures for your current version in OS X 10.6 (py2.7.2 32-bit): http://paste.pocoo.org/show/551273 http://paste.pocoo.org/show/551274 http://paste.pocoo.org/show/551275

eteq · 2012-02-15T08:10:58Z

astropy/nddata/convolution/boundary_wrap.pyx

+                for ii in range(iimin, iimax):
+                    for jj in range(jjmin, jjmax):
+                        iii = ii % nx
+                        jjj = jj % ny


This % operator defaults (in cython>0.12) to checking if nx or ny are 0 (and raises a ZeroDivisionError if so), at a potentially significant performance penalty (http://wiki.cython.org/enhancements/compilerdirectives) - you would fix this by adding the cdivision directive I note above.

Weirdly, it turns out that if I add the cdivision directive, the tests segfault and do not complete:

https://jenkins.shiningpanda.com/astropy/job/astrofrog-staging/PYTHON_ENV=env2.7-numpy1.6.1/5/console

It seems that the tests segfault both on linux and mac, so I'm not sure what's going on. So for now, I'm leaving the directive out.

I just ran the tests right now after adding in @cython.cdivision(True) for all four pyx files and got no segfaults (although one of the tests did fail). It's only a ~5-10% speedup to add this (for a 500x500 array w/ 3x3 kernel), though, so perhaps it's not a big deal.

eteq · 2012-02-15T08:15:51Z

@astrofrog - you may know this already, but one very useful trick for optimizing Cython code: do cython -a foo.pyx - that will generate a file foo.html that has the .pyx file, but where if you click on a line, it will show what C code that expands into. More importantly, it shows lines in yellow that are expanding into slow C code... so it shows where you might have forgotten to declare some cdef or something... Or it will reveal places where you can give compiler directives to speed things up (that's what made me realize the comments I made above).

….ndimage)

….nddata

…alues get overwritten.

astrofrog · 2012-03-07T08:37:56Z

No, this is not ready yet - I need to fix the normalization as you suggested, and implement 1-D and 3-D convolution. I was away for the last week, but should be able to make progress on this soon.

…allow nested lists to be passed instead of arrays.

…-dimensional convolution.

…ution.

astrofrog · 2012-03-16T22:57:36Z

@eteq - you can now pass nested lists as input, integer values will work, and you can normalize the kernel on the fly (though default is not to).

I've implemented 1-d and 3-d convolution, and I'm now going to improve the docs a little before a final review.

astrofrog · 2012-03-17T00:04:26Z

This pull request is now ready for final review!

eteq · 2012-03-17T00:26:26Z

I think it might be good to organize the documentation you added slightly differently (to match other packages), but we can address that later after the content itself is merged in. (And overall, I like the doc content a lot!)

All tests pass for me, so as far as I'm concerned, you can merge when ready.

astrofrog · 2012-03-17T19:56:10Z

Agree about the documentation structure - I just wanted to have something there, but it needs to be integrated better into the rest of the docs.

@taldcroft, @iguananaut, @mdboom - I plan to merge this at the start of the week, in case you want to review this too.

keflavich · 2012-03-21T21:47:30Z

I'm a latecomer to this thread, but I have an alternative solution to the convolution-ignoring-nans problem implemented here: http://code.google.com/p/agpy/source/browse/trunk/AG_fft_tools/convolve_nd.py

It uses FFTs (either numpy's or FFTW3's, if FFTW3 is installed) and gets around NaNs by setting them to zero, then creating a "weight" array that is 1 where the image is not nan and 0 where it is nan. The weight array is smoothed with the same kernel as the image, then divided out. In any location where the kernel does not encounter NaNs, the weight stays 1, but if there is a nearby NaN, the weight will be decreased, so the average will be over fewer pixels.

The implementation posted above works in N dimensions. It could use more & better unit tests, and it would be especially good to compare directly to the "convol" approach implemented here. I think it would be good to include both implementations, but the pull request is obviously more mature in terms of astropy compliance.

eteq · 2012-03-21T22:14:05Z

@keflavich Have you done any performance testing? I suspect this PR is faster because it's all Cython-optimized... and I'm also concerned about the memory cost of an FFT-based approach.

Is there some other advantage to the FFT approach aside from allowing arbitrary dimensionality?

I'm inclined to suggest that we merge this pull request now, and then @keflavich, if you want to re-write your code into an astropy-compliant form, you can issue a new PR later that either adds an n-dimensional option or replaces this one once it's ready (if there's a compelling reason to replace this one). Does that sound good?

keflavich · 2012-03-21T22:21:21Z

I was under the impression that FFTs are generally the fastest way to
perform convolutions. They increase execution time as n log n, while
a straight convolution goes as n^2. So, the answer will depend on
image size - for small images, I would expect this pull request to be
faster, but for large images ffts should win.

The memory cost is certainly an issue. Again, I like having both
options. My version of the FFT convolution can be extra memory
intensive in favor of speed (it allows padding to the nearest
2^n+3^n+5^n dimensional shape).

While cython optimization is probably pretty fast, numpy's ffts and
fftws are really intended to be fast. My comparison of the different
ffts is here:
http://code.google.com/p/agpy/source/browse/trunk/tests/test_ffts.py

I haven't written tests for convolution yet... I'd like to see that
comparison though.

On Wed, Mar 21, 2012 at 4:14 PM, Erik Tollerud
reply@reply.github.com
wrote:

@keflavich Have you done any performance testing? I suspect this PR is faster because it's all Cython-optimized... and I'm also concerned about the memory cost of an FFT-based approach.

Is there some other advantage to the FFT approach aside from allowing arbitrary dimensionality?

I'm inclined to suggest that we merge this pull request now, and then @keflavich, if you want to re-write your code into an astropy-compliant form, you can issue a new PR later that either adds an n-dimensional option or replaces this one once it's ready (if there's a compelling reason to replace this one). Does that sound good?

Reply to this email directly or view it on GitHub:
#155 (comment)

Adam

eteq · 2012-03-22T00:40:33Z

@keflavich I see your point here on the O(n log n) scaling, so I can definitely see the merits of having both available. I'm mainly concerned about the potential for confusion in having a variety of convolution options. But I think this could be easily remedied by an explicit enough naming scheme - e.g., where this pull request centers around the driver function just named convolve, there counld be a second one that has a driver function convolvefft and make sure to include a note in the docstrings indicating basically "use convolve when you are running into memory issues, and convolvefft to better scale with size."

Would you be fine with us merging this one as is, and have you submit a pull request later for the fft-based version (after you've got the doscstrings and code to astropy standards)?

keflavich · 2012-03-22T01:15:11Z

Absolutely, that's essentially what I intended, I just wanted to bring
this to everyone's attention. I agree that "convolvefft" is a good
naming scheme, and that's a good way to distinguish the two.

On Wed, Mar 21, 2012 at 6:40 PM, Erik Tollerud
reply@reply.github.com
wrote:

@keflavich I see your point here on the O(n log n) scaling, so I can definitely see the merits of having both available. I'm mainly concerned about the potential for confusion in having a variety of convolution options. But I think this could be easily remedied by an explicit enough naming scheme - e.g., where this pull request centers around the driver function just named convolve, there counld be a second one that has a driver function convolvefft and make sure to include a note in the docstrings indicating basically "use convolve when you are running into memory issues, and convolvefft to better scale with size."

Would you be fine with us merging this one as is, and have you submit a pull request later for the fft-based version (after you've got the doscstrings and code to astropy standards)?

Reply to this email directly or view it on GitHub:
#155 (comment)

Adam

astrofrog · 2012-03-22T08:40:36Z

I also agree that we could add always add a convolvefft (or rather convolve_fft!) function after that would live side by side with the current convolve. Note also that you could try re-using some of the tests that exist for convolve, because at the end of the day, one would hope that both techniques give the same result.

I'm going to merge thie PR now since we agree that is the way to proceed.

NaN-friendly convolution

embray · 2012-03-22T15:34:21Z

On the convolvefft issue, I'm kind of in favor of just having a single n-dimensional convolve function in the API. There's no reason, however, that there can't be a choice of algorithms implementing it, and selectable via a simple keyword argument. Then we can have a convolution shootout over a variety of images to determine which one would make for the best default :)

But the way I see it, they're just different strategies for doing the same thing. The advantages and disadvantages of each strategy can be explained in the docstring (just as @astrofrog's implementation is already doing for the different boundary strategies).

The downside I see to this approach is that each implementation has some optional arguments that are in no way compatible with the other, which could lead to confusion. Though I tend to think that with a bit of cleaning up, and well organized documentation, this can be mitigated.

eteq · 2012-03-22T21:32:30Z

@iguananaut - perhaps you should copy this comment into #182 now that it exists as a separate PR?

keflavich · 2012-03-24T19:49:23Z

Noticed while writing convolve_fft (#182) tests - convolve may change the kernel array (modify in-place). This happens if kernel.dtype.kind == 'f'. I think this can be solved by replacing the initial checking routines with "kernel = np.asarray(kernel,dtype='float')" as @mdboom suggested on #182. If you want to keep that behavior, it should be documented.

astrofrog · 2012-03-25T15:35:08Z

@keflavich - good catch! I'll fix this.

NaN-friendly convolution

Fix compatibility with Matplotlib 1.4 which now enforces input_dims

embray reviewed Feb 9, 2012
View reviewed changes

eteq reviewed Feb 15, 2012
View reviewed changes

astrofrog added 13 commits February 15, 2012 22:16

Initial version of 2-d convolution, with different treatments for edges

3fac49c

Renamed to follow scipy convolve convention. Improved docstring.

cb850bb

Moved convolution routines to astropy.nddata (scipy has them in scipy…

9ba1929

….ndimage)

Added 2D convolution tests

938352c

Avoid issues when importing astropy.nddata.convolve before building

00d54cc

Added convolution tests with NaN value

dcf2335

Minor tweaks to docs

c176f94

Re-arranged files to be able to import convolve directly from astropy…

a3f9afe

….nddata

Fixed typo in docstring

d1d9423

Use np.empty instead of np.zeros for initializing arrays, since all v…

6de167d

…alues get overwritten.

Improved error message for kernels with incorrect dimensions.

ffc3321

Do not convolve pixels that have NaN values

51992ad

Added basic documentation for astropy.nddata.convolve

334e6f2

astrofrog added 5 commits March 16, 2012 21:33

Don't normalize the kernel by default (but provide the option to do so)

94b45e4

Allow integer arrays and/or kernels to be passed to convolution, and …

f718cad

…allow nested lists to be passed instead of arrays.

Implemented 1- and 3-dimensional convolution, and updated tests for 2…

306e2fb

…-dimensional convolution.

Added 1-d and 3-d convolution tests

f98d894

Updated docs to mention kernel normalization and list input to convol…

57b06ad

…ution.

astrofrog added 4 commits March 17, 2012 00:42

Added a few convolution examples

485f745

More examples for convolution

875165b

Added images for convolution docs

d45c7cc

Fixed typo in convolution

d740723

astrofrog added a commit that referenced this pull request Mar 22, 2012

Merge pull request #155 from astrofrog/convolve

f7f667a

NaN-friendly convolution

astrofrog merged commit f7f667a into astropy:master Mar 22, 2012

keflavich mentioned this pull request Apr 17, 2012

convolve_fft - counterpart to convolve. NaN-friendly and mask-friendly, N-dimensional convolution #182

Closed

keflavich pushed a commit to keflavich/astropy that referenced this pull request Oct 9, 2013

Merge pull request astropy#155 from astrofrog/convolve

4919af5

NaN-friendly convolution

astrofrog deleted the convolve branch July 5, 2016 18:44

astrofrog added a commit to astrofrog/astropy that referenced this pull request Nov 23, 2016

Merge pull request astropy#155 from astrofrog/fix-mpl-14

b89461e

Fix compatibility with Matplotlib 1.4 which now enforces input_dims

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN-friendly convolution #155

NaN-friendly convolution #155

astrofrog commented Feb 8, 2012

eteq commented Feb 9, 2012

astrofrog commented Feb 9, 2012

embray Feb 9, 2012

astrofrog Feb 11, 2012

eteq commented Feb 11, 2012

astrofrog commented Feb 11, 2012

astrofrog commented Feb 12, 2012

astrofrog commented Feb 12, 2012

astrofrog commented Feb 14, 2012

eteq commented Feb 15, 2012

eteq Feb 15, 2012

astrofrog Feb 15, 2012

eteq Feb 16, 2012

eteq commented Feb 15, 2012

astrofrog commented Mar 7, 2012

astrofrog commented Mar 16, 2012

astrofrog commented Mar 17, 2012

eteq commented Mar 17, 2012

astrofrog commented Mar 17, 2012

keflavich commented Mar 21, 2012

eteq commented Mar 21, 2012

keflavich commented Mar 21, 2012

eteq commented Mar 22, 2012

keflavich commented Mar 22, 2012

astrofrog commented Mar 22, 2012

embray commented Mar 22, 2012

eteq commented Mar 22, 2012

keflavich commented Mar 24, 2012

astrofrog commented Mar 25, 2012

NaN-friendly convolution #155

NaN-friendly convolution #155

Conversation

astrofrog commented Feb 8, 2012

eteq commented Feb 9, 2012

astrofrog commented Feb 9, 2012

embray Feb 9, 2012

Choose a reason for hiding this comment

astrofrog Feb 11, 2012

Choose a reason for hiding this comment

eteq commented Feb 11, 2012

astrofrog commented Feb 11, 2012

astrofrog commented Feb 12, 2012

astrofrog commented Feb 12, 2012

astrofrog commented Feb 14, 2012

eteq commented Feb 15, 2012

eteq Feb 15, 2012

Choose a reason for hiding this comment

astrofrog Feb 15, 2012

Choose a reason for hiding this comment

eteq Feb 16, 2012

Choose a reason for hiding this comment

eteq commented Feb 15, 2012

astrofrog commented Mar 7, 2012

astrofrog commented Mar 16, 2012

astrofrog commented Mar 17, 2012

eteq commented Mar 17, 2012

astrofrog commented Mar 17, 2012

keflavich commented Mar 21, 2012

eteq commented Mar 21, 2012

keflavich commented Mar 21, 2012

eteq commented Mar 22, 2012

keflavich commented Mar 22, 2012

astrofrog commented Mar 22, 2012

embray commented Mar 22, 2012

eteq commented Mar 22, 2012

keflavich commented Mar 24, 2012

astrofrog commented Mar 25, 2012